Possible explanation on the effect of variable selection on PAM used with SMOTE In our simulation studies with high-dimensional class-imbalanced data

ثبت نشده
چکیده

In our simulation studies with high-dimensional class-imbalanced data we observed that under the null case SMOTE had hardly any effect on classification with PAM, when all the p = 1000 simulated variables where considered. On the other hand, if only a subset of the variables was used (G = 40), SMOTE seemed beneficial in reducing the class-imbalance problem of PAM, decreasing the number of samples classified in the majority class. This behavior can be seen clearly comparing the left and right panels reporting the PAM results in Figure 2.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Classifiers in Software Fault-Proneness Prediction

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...

متن کامل

Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection

The Synthetic Minority Over Sampling TEchnique (SMOTE) is a widely used technique to balance imbalanced data. In this paper we focus on improving SMOTE in the presence of class noise. Many improvements of SMOTE have been proposed, mostly cleaning or improving the data after applying SMOTE. Our approach differs from these approaches by the fact that it cleans the data before applying SMOTE, such...

متن کامل

Simulation of Smoke Emission from Fires in High-Rise Buildings Using the 3D Model Generated from 2-Dimensional Cadastral Data

Having a 3-Dimensional model of high-rise buildings can be used in disaster management such as fire cases to reduce casualties. The fundamental dilemma in 3D building modeling is the unavailability of suitable data sources. However, available cadastral 2D maps could be used as low-cost and attainable resources for 3D building modeling. Smoke will be a great threat to people's health during a f...

متن کامل

Fuzzy-rough imbalanced learning for the diagnosis of High Voltage Circuit Breaker maintenance: The SMOTE-FRST-2T algorithm

For any electric power system, it is crucial to guarantee a reliable performance of its High Voltage Circuit Breaker (HCVB). Determining when the HCVB needs maintenance is an important and non-trivial problem, since these devices are used over extensive periods of time. In this paper, we propose the use of data mining techniques in order to predict the need of maintenance. In the corresponding ...

متن کامل

A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems

This contribution proposes a powerful technique for two-class imbalanced classification problems by combining the synthetic minority over-sampling technique (SMOTE) and the particle swarm optimisation (PSO) aided radial basis function (RBF) classifier. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is appli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013